Continent

Column 1

Waffle plot of journal paper percentages, by continent (each square = 1% of data)

Table of journal paper percentages, by continent

Column 2

Context

Representativity of First Authors in Psychology

A large proportion of first authors in psychology are located in North America or Europe, mostly in the US (Thalmayer et al., 2021, Arnett, 2008). In this dashboard, I present some aggregated data by continent, country, year, and journal (for first authors only).

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Method & Data

The data from this report include information about publications from one of six psychology journals (Developmental Psychology, Journal of Personality and Social Psychology, Journal of Abnormal Psychology, Journal of Family Psychology, Health Psychology, and Journal of Educational Psychology) for years 2020 to 2023. They include information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the easyPubMed package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Missing data

Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values.

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Next Steps

Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, and (c) automatically updating the data every week or so.

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Waffle gif

Column 1

Waffle plot of journal paper percentages, by continent and year (each square = 1% of data)

Column 2

Table of journal paper percentages, by continent

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Dygraph (monthly)

Column 1

Time series plot of journal paper percentages, by continent and year (each square = 1% of data)

Column 2

Table of journal paper percentages, by continent

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Dygraph (yearly)

Column 1

Time series plot of journal paper percentages, by continent and year (each square = 1% of data)

Column 2

Table of journal paper percentages, by continent

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Scatter plot (plotly)

Column 1

Scatter plot of journal paper percentages, by continent and year (each square = 1% of data)

Column 2

Table of journal paper percentages, by continent

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Scatter plot (rempsyc scatter smooth)

Column 1

Scatter plot of journal paper percentages, by continent and year (each square = 1% of data)

Column 2

Table of journal paper percentages, by continent

* In this table and all following tables, the percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

---
title: "Busara Dashboard"
author: "Rémi Thériault"
output:
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social: menu
    source_code: embed
    # theme: lumen
    storyboard: false
---

```{r setup, include=FALSE}
library(flexdashboard)
```

```{r}
# Load packages
library(easyPubMed)
library(dplyr)
library(purrr)
library(stringr)
library(stringi)
library(DT)
library(fuzzyjoin)
library(countrycode)
library(keyring)
library(Ecfun)
library(tidyr)
library(waffle)
library(plotly)
library(ggplot2)
library(gganimate)
library(animation)
library(knitr)
library(lubridate)
library(xts)
library(tibble)
library(dygraphs)
library(rempsyc)

articles.df4 <- readRDS("articles.df4.rds")
```

```{r, eval=T}
waffle2 <- function(parts, rows=10, keep=TRUE, xlab=NULL, title=NULL, colors=NA,
                    size=2, flip=FALSE, reverse=FALSE, equal=TRUE, pad=0,
                    use_glyph = FALSE,
                    glyph_size = 12,
                    glyph_font = "Font Awesome 5 Free Solid",
                    glyph_font_family = "FontAwesome5Free-Solid",
                    legend_pos = "right") {
  if (inherits(parts, "data.frame")) {
    stats::setNames(
      unlist(parts[, 2], use.names = FALSE),
      unlist(parts[, 1], use.names = FALSE)
    ) -> parts
  }

  # fill in any missing names
  part_names <- names(parts)
  if (length(part_names) < length(parts)) {
    part_names <- c(part_names, LETTERS[1:length(parts) - length(part_names)])
  }

  names(parts) <- part_names

  # use Set2 if no colors are specified
  if (all(is.na(colors))) colors <- suppressWarnings(RColorBrewer::brewer.pal(length(parts), "Set2"))

  # make one big vector of all the bits
  parts_vec <- unlist(sapply(1:length(parts), function(i) {
    rep(names(parts)[i], parts[i])
  }))

  if (reverse) parts_vec <- rev(parts_vec)

  # setup the data frame for geom_rect
  dat <- expand.grid(y = 1:rows, x = seq_len(pad + (ceiling(sum(parts) / rows))))

  # add NAs if needed to fill in the "rectangle"
  dat$value <- c(parts_vec, rep(NA, nrow(dat) - length(parts_vec)))

  if (!inherits(use_glyph, "logical")) {

    if (length(use_glyph) == 1L) {

      if (grepl("wesom", glyph_font)) {
        fontlab <- .fa_unicode[.fa_unicode[["name"]] == use_glyph, "unicode"]
        dat$fontlab <- c(
          rep(fontlab, length(parts_vec)),
          rep("", nrow(dat) - length(parts_vec)
          # rep(NA, nrow(dat) - length(parts_vec)
          )
        )
      } else {
        dat$fontlab <- c(
          rep(use_glyph, length(parts_vec)),
          rep("", nrow(dat) - length(parts_vec)
          # rep(NA, nrow(dat) - length(parts_vec)
          )
        )
      }

    } else if (length(use_glyph) == length(parts)) {

      if (grepl("wesom", glyph_font)) {
        fontlab <- .fa_unicode[.fa_unicode[["name"]] %in% use_glyph, "unicode"]
        # fontlab <- .fa_unicode[use_glyph]
        dat$fontlab <- c(
          fontlab[as.numeric(factor(parts_vec, levels = names(parts)))],
          rep("", nrow(dat) - length(parts_vec))
          # rep(NA, nrow(dat) - length(parts_vec))
        )
      } else {
        dat$fontlab <- c(
          use_glyph[as.numeric(factor(parts_vec, levels = names(parts)))],
          # rep(NA, nrow(dat) - length(parts_vec))
          rep("", nrow(dat) - length(parts_vec))
        )
      }

    } else if (length(use_glyph) == length(parts_vec)) {

      if (grepl("wesom", glyph_font)) {
        fontlab <- .fa_unicode[.fa_unicode[["name"]] %in% use_glyph, "unicode"]
        dat$fontlab <- c(fontlab, rep(NA, nrow(dat) - length(parts_vec)))
      } else {
        dat$fontlab <- c(use_glyph, rep(NA, nrow(dat) - length(parts_vec)))
      }

    } else {
      stop("'use_glyph' must have length 1, length(parts), or sum(parts)")
    }
  }

  dat$value <- ifelse(is.na(dat$value), " ", dat$value)

  if (" " %in% dat$value) part_names <- c(part_names, " ")
  if (" " %in% dat$value) colors <- c(colors, "#00000000")

  dat$value <- factor(dat$value, levels = part_names)

  gg <- ggplot(dat, aes(x = x, y = y))

  if (flip) gg <- ggplot(dat, aes(x = y, y = x))

  gg <- gg + theme_bw()

  # make the plot

  if (inherits(use_glyph, "logical")) {

    gg <- gg + geom_tile(aes(fill = value), color = "white", size = size)

    gg <- gg + scale_fill_manual(
      name = "",
      values = colors,
      label = part_names,
      na.value = "white",
      drop = !keep
    )

    gg <- gg + guides(fill = guide_legend(override.aes = list(colour = "#00000000")))

    gg <- gg + theme(legend.background =
                       element_rect(fill = "#00000000", color = "#00000000"))

    gg <- gg + theme(legend.key =
                       element_rect(fill = "#00000000", color = "#00000000"))

  } else {

    if (extrafont::choose_font(glyph_font, quiet = TRUE) == "") {
      stop(
        sprintf(
          "Font [%s] not found. Please install it and use extrafont to make it available to R",
          glyph_font
        ),
        call. = FALSE
      )
    }

    load_fontawesome()

    gg <- gg + geom_tile(
      color = "#00000000", fill = "#00000000", size = size,
      alpha = 0, show.legend = FALSE
    )

    gg <- gg + geom_point(
      aes(color = value), fill = "#00000000", size = 0,
      show.legend = TRUE
    )

    gg <- gg + geom_text(
      aes(color = value, label = fontlab),
      family = glyph_font_family,
      size = glyph_size,
      show.legend = FALSE
    )

    gg <- gg + scale_color_manual(
      name = NULL,
      values = colors,
      labels = part_names,
      drop = !keep
    )

    gg <- gg + guides(color =
                        guide_legend(override.aes = list(shape = 15, size = 7)))

    gg <- gg + theme(legend.background =
                       element_rect(fill = "#00000000", color = "#00000000"))

    gg <- gg + theme(legend.key = element_rect(color = "#00000000"))
  }

  gg <- gg + labs(x = xlab, y = NULL, title = title)
  gg <- gg + scale_x_continuous(expand = c(0, 0))
  gg <- gg + scale_y_continuous(expand = c(0, 0))

  if (equal) gg <- gg + coord_equal()

  gg <- gg + theme(panel.grid = element_blank())
  gg <- gg + theme(panel.border = element_blank())
  gg <- gg + theme(panel.background = element_blank())
  gg <- gg + theme(panel.spacing = unit(0, "null"))

  gg <- gg + theme(axis.text = element_blank())
  gg <- gg + theme(axis.title.x = element_text(size = 10))
  gg <- gg + theme(axis.ticks = element_blank())
  gg <- gg + theme(axis.line = element_blank())
  gg <- gg + theme(axis.ticks.length = unit(0, "null"))

  gg <- gg + theme(plot.title = element_text(size = 18))

  gg <- gg + theme(plot.background = element_blank())
  # gg <- gg + theme(panel.spacing = unit(c(0, 0, 0, 0), "null"))
  
  gg <- gg + theme(legend.position = legend_pos)

  gg

}
```

# Continent

## Column 1 {data-width=2150}

### Waffle plot of journal paper percentages, by continent (each square = 1% of data) {data-height=600}

```{r continent_table_overall}
df.continent <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            Missing = first(missing),
            ) %>% 
  mutate(across(`North America`:Missing, ~ .x * 100)) # %>% 
  # mutate(across(`North America`:Missing, ~ round(.x, 2))) %>% 
  # rename_with(str_to_title) %>%
  # rename(`Missing*` = Missing) %>% 
  # datatable(#extensions = 'Responsive',
  #           options = list(searching = FALSE, paging = FALSE),
  #           caption = "Journal paper percentages, by continent")

data.waffle <- set_names(as.numeric(t(df.continent[2:7])), names(df.continent[2:7]))

waffle2(data.waffle, legend_pos = "right") #+ # rows = 5, 
  #coord_cartesian(ylim=c(0,2))#%>% 
  #ggplotly()

```

### Table of journal paper percentages, by continent {data-height=200}

```{r}
df.continent  %>%
  mutate(across(`North America`:Missing, ~ round(.x, 2))) %>%
  rename_with(str_to_title) %>%
  rename(`Missing*` = Missing) %>%
  datatable(#extensions = 'Responsive',
            options = list(searching = FALSE, paging = FALSE),
            caption = "Journal paper percentages, by continent")
```

## Column 2 {.tabset .tabset-fade}

### Context

**Representativity of First Authors in Psychology**

A large proportion of first authors in psychology are located in North America or Europe, mostly in the US (Thalmayer et al., 2021, Arnett, 2008). In this dashboard, I present some aggregated data by continent, country, year, and journal (for first authors only).

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

### Method & Data

The data from this report include information about publications from one of six psychology journals (*Developmental Psychology*, *Journal of Personality and Social Psychology*, *Journal of Abnormal Psychology*, *Journal of Family Psychology*, *Health Psychology*, and *Journal of Educational Psychology*) for years 2020 to 2023. They include information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the `easyPubMed` package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

### Missing data

Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values.

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

### Next Steps

Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, and (c) automatically updating the data every week or so.

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Waffle gif

## Column 1 {data-width=800}

### Waffle plot of journal paper percentages, by continent and year (each square = 1% of data) {data-height=600}

```{r, include=FALSE}
df.continent2 <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(`North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            ) %>% 
  mutate(across(2:6, ~ .x * 100)) %>% 
  arrange(year)

df.continent3 <- df.continent2 %>%
  pivot_longer(-year, names_to = "continent", values_to = "number") %>%
  mutate(year = as.numeric(year))

saveGIF({
  for (i in rev(unique(df.continent3$year))) {
    d1 = df.continent3 %>% filter(year==i)
    gg1 <- d1 %>%
      ggplot(aes(fill = continent, values = number)) +
      geom_waffle(color = "white",
                  na.rm = TRUE) +
      scale_y_continuous(expand=c(0,0)) +
      ggthemes::scale_fill_tableau(name = NULL) +
      coord_equal() +
      theme_minimal(base_size=40) +
      theme(panel.grid = element_blank(), 
            axis.text = element_blank(), 
            legend.position = "bottom",
            plot.title=element_text(hjust=0.5)) +
      labs(title=i)
    print(gg1)
  }
}, movie.name="waffle.gif", ani.height=500, ani.width=500*3)

```

```{r}
include_graphics("waffle.gif") 

# waffle2(data.waffle, legend_pos = "right") +
#   transition_time(year)
```


## Column 2

### Table of journal paper percentages, by continent {data-height=200}

```{r}
continent.paper.missing <- articles.df4 %>% 
  group_by(year) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            Missing = first(missing),
            ) %>% 
  mutate(Missing = continent.paper.missing[-1],
         across(`North America`:Missing, ~ round(.x, 2))) %>% 
  arrange(desc(year)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and year")
```

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Dygraph (monthly)

## Column 1 {data-width=800}

### Time series plot of journal paper percentages, by continent and year (each square = 1% of data) {data-height=600}

```{r}
df.continent4 <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n(),
         date = floor_date(date, "month")) %>% 
  filter(!is.na(continent)) %>% 
  group_by(date) %>% 
  summarize(`North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            ) %>% 
  mutate(across(2:6, ~ .x * 100)) %>% 
  arrange(date)

q <- df.continent4 %>% 
  column_to_rownames(var = "date") %>% 
  as.xts

dygraph(q) %>% 
  dyRangeSelector() %>% 
  dyUnzoom() %>% 
  dyCrosshair(direction = "vertical")

```

## Column 2

### Table of journal paper percentages, by continent {data-height=200}

```{r}
continent.paper.missing <- articles.df4 %>% 
  group_by(year) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            Missing = first(missing),
            ) %>% 
  mutate(Missing = continent.paper.missing[-1],
         across(`North America`:Missing, ~ round(.x, 2))) %>% 
  arrange(desc(year)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and year")
```

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Dygraph (yearly)

## Column 1 {data-width=800}

### Time series plot of journal paper percentages, by continent and year (each square = 1% of data) {data-height=600}

```{r}
q <- df.continent2 %>% 
  mutate(year = as.Date(year, "%Y")) %>% 
  column_to_rownames(var = "year") %>% 
  as.xts

dygraph(q) %>% 
  dyRangeSelector() %>% 
  dyUnzoom() %>% 
  dyCrosshair(direction = "vertical")

```

## Column 2

### Table of journal paper percentages, by continent {data-height=200}

```{r}
continent.paper.missing <- articles.df4 %>% 
  group_by(year) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            Missing = first(missing),
            ) %>% 
  mutate(Missing = continent.paper.missing[-1],
         across(`North America`:Missing, ~ round(.x, 2))) %>% 
  arrange(desc(year)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and year")
```

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Scatter plot (plotly)

## Column 1 {data-width=800}

### Scatter plot of journal paper percentages, by continent and year (each square = 1% of data) {data-height=600}

```{r}
nice_scatter(df.continent3, "year", "number", group = "continent", ytitle = "% of All Papers") %>% 
  ggplotly()

```

## Column 2

### Table of journal paper percentages, by continent {data-height=200}

```{r}
continent.paper.missing <- articles.df4 %>% 
  group_by(year) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            Missing = first(missing),
            ) %>% 
  mutate(Missing = continent.paper.missing[-1],
         across(`North America`:Missing, ~ round(.x, 2))) %>% 
  arrange(desc(year)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and year")
```

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.


# Scatter plot (rempsyc scatter smooth)

## Column 1 {data-width=800}

### Scatter plot of journal paper percentages, by continent and year (each square = 1% of data) {data-height=600}

```{r, fig.width=12, fig.height=7}
nice_scatter(df.continent3, "year", "number", group = "continent",
             ytitle = "% of All Papers", method = "loess") #%>% 
  # ggplotly()

```

## Column 2

### Table of journal paper percentages, by continent {data-height=200}

```{r}
continent.paper.missing <- articles.df4 %>% 
  group_by(year) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            Asia = sum(continent == "Asia")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            Missing = first(missing),
            ) %>% 
  mutate(Missing = continent.paper.missing[-1],
         across(`North America`:Missing, ~ round(.x, 2))) %>% 
  arrange(desc(year)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and year")
```

> \* In this table and all following tables, the percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.